Application of Quantum Chemical Topology Force Field FFLUX to Condensed Matter Simulations: Liquid Water

We present here the first application of the quantum chemical topology force field FFLUX to condensed matter simulations. FFLUX offers many-body potential energy surfaces learnt exclusively from ab initio data using Gaussian process regression. FFLUX also includes high-rank, polarizable multipole moments (up to quadrupole moments in this work) that are learnt from the same ab initio calculations as the potential energy surfaces. Many-body effects (where a body is an atom) and polarization are captured by the machine learning models. The choice to use machine learning in this way allows the force field’s representation of reality to be improved (e.g., by including higher order many-body effects) with little detriment to the computational scaling of the code. In this manner, FFLUX is inherently future-proof. The “plug and play” nature of the machine learning models also ensures that FFLUX can be applied to any system of interest, not just liquid water. In this work we study liquid water across a range of temperatures and compare the predicted bulk properties to experiment as well as other state-of-the-art force fields AMOEBA(+CF), HIPPO, MB-Pol and SIBFA21. We find that FFLUX finds a place amongst these.


Additional S-curves
This section gives the S-curves for the 3 components of the dipole moment and the 5 components of the quadrupole moment. Note that the multipole moments are learnt and predicted in spherical tensor form. Subsequently they are converted to Cartesian tensors inside the program DL_FFLUX. As such the S-curves are given for the components in spherical tensor form.

Note on time steps
The timestep chosen for DL_FFLUX runs was 1 fs, which represents a middle ground compared to the timesteps used by the four other force fields that we compare with, ranging from 0.2 fs to 2 fs. In particular, properties obtained with AMOEBA+ using 0.5 fs and 2 fs differ little, while our own test on the IR spectrum using 0.5 fs showed an improvement that was too marginal to justify the computational cost. We note that MB-pol uses 0.2 fs, which is very small, while SIBFA21 uses 1 fs and HIPPO 2 fs. Very few properties are known to be overly senstitive to timestep but researchers often reduce the timestep from say 1 fs to 0.5 fs for the IR spectrum.

Electrostatic convergence
Convergence has two meanings in the context of the multipole expansion. The first relates to the formal mathematical condition that, if satisfied, ensures the expansion is well defined and thus guaranteed to converge. If this condition is violated (this may happen when charge distributions overlap) then the expansion may diverge. The other kind of convergence relates to the truncation of the infinite series at a finite number of terms. A study of this kind of convergence examines how the series approaches the infinite limit value (the 'true' value) as an increasing number of terms are included. A considerable amount of work has been done examining this kind of convergence 1-3 . The results presented here build on the work of Rafat and Popelier 4 , which examines the convergence of the multipole expansion in the case of a water dimer in the geometry of the global energy minimum. Specifically the hydrogen bond O-H interaction was studied in refernce 4. This work served as a useful benchmark during the development of the DL_FFLUX code. Figure 1 of reference 4 (Rafat and Popelier) is a key plot that is partially (up to L=5) reproduced in Figure  S9 with some added data of our won. Figure S9 shows the electrostatic energy of the hydrogen bond in the water dimer at various values of L=l A +l B +1 where l A and l B are the rank of the multipole moments on a pair of interacting atoms A and B, respectively. L determines which electrostatic interactions are calculated, e.g. at L=2 charge-charge and charge-dipole interactions are calculated while L=5 includes terms up to chargehexadecapole. The horizontal red line in Figure S9 is the true value of the electrostatic interaction, which is determined via a 6-dimensional integral carried out by the program AIMAll. All of the data (multipole moments, geometries etc.) were plugged into DL_FFLUX in order to check whether DL_FFLUX reproduced the curve of Rafat and Popelier. This was a highly important validation step as the two curves, that overlap perfectly in Figure S9, are produced by entirely different computer programs. This test is a crucial validation of the DL_FFLUX code.  4 . The red horizontal line is the result of the 6D integration using the program AIMAll. The blue curve is taken from the Rafat paper (data truncated at L=5). The FFLUX curve was produced using the same inputs as Rafat but using the DL_FFLUX code.
The work of Rafat and Popelier in Figure S9 shows that relatively high rank multipole moments (octopole and hexadecapole) are required for the expansion to converge well. In order to determine whether or not this finding generalises across many dimer configurations, not just the global minimum, another test using the in-house program PROMETHEUS was carried out. This test used the same 100 water dimers as described in the main text. The convergence of the multipole expansion was assessed by comparing the 6D integration energy of the electrostatic interaction between each pair of atoms in the dimer (9 pairs in total) to the energy as computed using the multipole expansion. The multipole moments that are fed into the multipole expansion are computed from the dimer geometries as follows. Firstly, a wavefunction for each dimer was computed using DFT at the B3LYP/aug-cc-pVTZ level of theory with the program GAUSSIAN09. The wavefunction was then fed to the program AIMAll, which computes the atomic multipole moments. In other words, QCT multipole moments are fed into the expansion and compared against the QCT true energy. The expansion is truncated at various points and the difference between the true and expansion energies computed. Figure S10 shows how well the multipole expansion converges at various values of L'. Note that we are now using the L' convention rather than the L convention mentioned above. A convergence criterion of 1 kJ/mol at such short range is a stringent test. The higher the rank of the moments involved the interaction, the more quickly the term dies off as a function of distance. The proportionality to distance is R -L . As such, the short range is where the higher order terms will still be relevant. The largest allowed oxygen-oxygen distance in the 100 dimers is 3.5 Å (chosen as it encompasses the first solvation shell) which is very much short range. Figure S11 shows the same graph with the convergence criterion relaxed to 1 kcalmol -1 . In both Figures S10 and S11 it is evident that hydrogen atoms are better described than oxygen atoms by lower rank multipole moments. This is unsurprising given there is considerably less electron density on hydrogen atoms. It is also clear that, even with a convergence criterion of 1 kcalmol -1 , the electrostatics is poorly described at the level of point charge only. In order to get a good description of short-range electrostatics, at minimum dipole and quadrupole moments are essential. If near perfect convergence is the goal then even octopole moments are required. However, this comes at considerable computational cost and arguably may not be worth the gain. This work shows that the findings of Rafat and Popelier 4 generalise to configurations of the water dimer beyond just the global minimum. These findings are in line with the push towards including high rank multipole moments in many state-of-the-art force fields.      Figure S13 demonstrates that the 3 rd order polynomial is an excellent fit to the data.

Infrared spectrum
As reported in the main text, there are several choices for the quantum correction factor. These are shown in Equations S1, S2 and S3. The exact choice is relatively arbitrary and it is generally the case that the factor that produces the best spectrum is chosen. We find in this work that all of the factors give essentially the same spectrum. (S1)

Radial distribution functions
The following sections provide the raw data for each of the RDFs plotted in the main text.